Adding High-Precision Links to Wikipedia
نویسندگان
چکیده
Wikipedia’s link structure is a valuable resource for natural language processing tasks, but only a fraction of the concepts mentioned in each article are annotated with hyperlinks. In this paper, we study how to augment Wikipedia with additional high-precision links. We present 3W, a system that identifies concept mentions in Wikipedia text, and links each mention to its referent page. 3W leverages rich semantic information present in Wikipedia to achieve high precision. Our experiments demonstrate that 3W can add an average of seven new links to each Wikipedia article, at a precision of 0.98.
منابع مشابه
Adapting Wikification to Cultural Heritage
Large numbers of cultural heritage items are now archived digitally along with accompanying metadata and are available to anyone with internet access. This information could be enriched by adding links to resources that provide background information about the items. Techniques have been developed for automatically adding links to Wikipedia to text but the methods are general and not designed f...
متن کاملClassifying Wikipedia Articles into NE's Using SVM's with Threshold Adjustment
In this paper, a method is presented to recognize multilingual Wikipedia named entity articles. This method classifies multilingual Wikipedia articles using a variety of structured and unstructured features and is aided by cross-language links and features in Wikipedia. Adding multilingual features helps boost classification accuracy and is shown to effectively classify multilingual pages in a ...
متن کاملFocused Search in Books and Wikipedia: Categories, Links and Relevance Feedback
In this paper we describe our participation in INEX 2009 in the Ad Hoc Track, the Book Track, and the Entity Ranking Track. In the Ad Hoc track we investigate focused link evidence, using only links from retrieved sections. The new collection is not only annotated with Wikipedia categories, but also with YAGO/WordNet categories. We explore how we can use both types of category information, in t...
متن کاملYAWN: A Semantically Annotated Wikipedia XML Corpus
The paper presents YAWN, a system to convert the well-known and widely used Wikipedia collection into an XML corpus with semantically rich, self-explaining tags. We introduce algorithms to annotate pages and links with concepts from the WordNet thesaurus. This annotation process exploits categorical information in Wikipedia, which is a high-quality, manually assigned source of information, extr...
متن کامل280 Birds with One Stone: Inducing Multilingual Taxonomies from Wikipedia using Character-level Classification
We propose a simple, yet effective, approach towards inducing multilingual taxonomies from Wikipedia. Given an English taxonomy, our approach leverages the interlanguage links of Wikipedia followed by character-level classifiers to induce high-precision, high-coverage taxonomies in other languages. Through experiments, we demonstrate that our approach significantly outperforms the stateof-the-a...
متن کامل